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Abstract 

We introduce a powerful method for dynamical reconstruction of 
long-lived tracers such as ozone. It works by correlating the principal 
components of a matrix representation of the tracer dynamics with a 
series of sparse measurements. The method is tested on the 500 K 
isentropic surface using a simulated tracer and with ozone measure- 
ments from the Polar Aerosol and Ozone Measurement (POAM) III 
satellite instrument. The Lyapunov spectrum is measured and used 
to quantify the lifetime of each principal component. Using a 60 day 
lead time and five (5) principal components, cross validation of the 
reconstructed ozone and comparison with ozone sondes return root- 
mean-square errors of 0.20 ppmv and 0.47 ppmv, respectively. 
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1 1 Introduction 
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Randall et al. 1 (l2002h demonstrate a method for proxy tracer reconstruction 



3 of ozone that works by correlating sparse measurements from satellites or 

4 other remote-sensing instruments with conserved tracer fields such as po- 

5 tential vorticity. The proxy tracer method is an approximate interpolation 

6 method, appropriate for long-lived tracers such as ozone, that takes into ac- 

7 count the wind dynamics. Here we demonstrate a similar method in which 

8 the tracer dynamics are represented as a matrix. The matrix is decom- 

9 posed using principal component analysis (PCA), also called singular value 

10 decomposition (SVD), and the largest principal components then fitted to 

11 the measurements. 



12 2 Background 

13 Consider a system of ordinary differential equations: 

/(^, t) (1) 



dx 
'dt 

14 We linearize this about x: 

d 



^^,a; + 5x) ^ f + Vf-6x (2) 
^^Sx ^ Vf x (3) 



15 Define H such that: 



H{t = 0) = I (5) 

16 where / is the identity matrix. This is what is known as the tangent model 

17 which we decompose using principal component analysis (PCA), also called 

18 singular value decomposition (SVD) or empirical orthogonal function (EOF) 

19 analysis: 

H = U-S-V^ (6) 

20 where U and V are orthonornaal ma trices while S is diagonal and contains the 

21 singular values (jPress et al.l . Il993l ). The matrix U contains the left singular 
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vectors while y contains the right singular vectors. The terms principal 
component (PC) and singular vector will be used interchangeable to denote 
a left singular vector. 

Singular vectors are increasingly being used in meteorology to quantify 
the predi ctability of a fore cast or to generate perturbations for ensemble 
forecasts (iTang et all l2006l ). An Eulerian tracer simulation is linear, that is 
the tangent vector is simply the dynamics. Thus: 



Xj ~ Hj • xq 



(7) 



where is a gridded representation of a passive flow tracer at the 

jth time step. Hj = Aj ■ Aj^i ■ Aj^2 ■ ... A3 ■ A2 ■ Ai is the tracer dynamics 
and the matrix Aj maps Xj-i to Xj. We can say that: 



A, 



V/dt 



(8) 
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The Lyapunov exponents are defined as the logarithms of the time aver- 
ages of the singular values in the limit as time goes to infinity: 



A,: 



lim — log Si; 

i— i-oo Zt 



(9) 
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where Sj is the ith singular value (lOttl . Il993[ l. For most systems: 

\Sx\^\5x{0)\exp{Xi) (10) 

That is, as H is integrated forward, the largest singu lar value a nd the largest 
singular vector will increasingly begin to dominate ( lOttl . Il993l ). 



37 3 Numerics and data 

3B To run the tracer advection, the ctraj software package is used (http: / / ctraj .sf.net) . 

39 The codes are written in C++ and contain programs for gridded, semi- 

40 Lagrangian tracer advection on an azimuthally-equidistant-projected coor- 

41 dinated system. Two fields are advected simultaneously, one for the North- 

42 ern hemisphere and one for the Southern hemisphere, with equator crossings 

43 accounted for. Gridding on both hemispheres is 100 by 100, or 200km-, 

44 1.8 degree-latitude-separation at the pole. Output is written to a series 
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of sparse matrices whi ch are then decomposed with the La nczos method 



(IGolub and Loan! . 119961 ) using the Arnoldi package (ARPACK) (ILehoucq and Scott 
1996h. 



The Polar Ozone and Aerosol Measurement (POAM) III instrument is 
a solar- occultation instrument mounted on a su n -sync hronous, low-earth- 
orbit satellite. Using optimal estimation iRodger i fl2000[ ). ozon e profiles are 



retrie ved within a narrow latitude band in either polar region (iLucke et al 



19991 ) ■ It is capable of returning 28 or 29 measurements per day, alternating 



between Northern and Southern hemisphere, however because of a malfunc- 
tion in the instrument, it normally operates in only one or the other hemi- 
sphere for longer periods. Therefore, we confine ourselves to earlier data, 
October and November 1998, when more frequent and diverse measurements 
are available. 

The National Center for Environmental Prediction (NCEP) supplies, free- 
of-charge, gridded (2.5 by 2.5 degrees longitude /lati t ude, 4 time daily), re- 
analyzed climate data starting in 1948 f lKalnav et aLl . ll996[ ). Wind and tem- 
perature data is used to drive the advection model. 



4 Tracer correlation 
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Two differently-initialized tracers, when integrated with the same wind fields 
over a long time period, become correlated. This can be used to infer global 
fields of a long-lived tracer su c h as ozone based on only a few sparse measure- 



ments (lAUen and Nakamural . l2003t iRandall et al.l . |2002| ) . Figure [T] demon 



strates this with the extreme example of an initially zonally-symmetric tracer 
and an initialy meridionally-symmetric tracer. Tracers are advected with Na- 
tional Center fo r Environmental Pre diction (NCEP) reanalysis 1 data at the 



500 K isentrop flKalnav et all . [19961) 



We also plot the correlation of the first tracer with the largest singular 
vector. We see that, because of Equation ffTOj) . they too become correlated 
over time. This at least partially explains the efficacy of the proxy tracer 
method. A sample PC as compared with the tracer is shown in Figure [21 

Figure [3] plots the time evolution of the singular values. From this we can 
calculate the Lyapunov spectrum by making straight line fits of thei r loga- 
rithm s. While the resulting fields may develop into complex fractals (iMillsl . 



2009[ ) the Lyapunov spectrum shows that the tracer dynamics themselves 



are not truly chaotic, but are only on the cusp: the largest singular value 
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Tracer correlation 



(X 
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Figure 1: The correlation over time of two differently- initialized tracers (bro- 
ken line)-zonally symmetric and meridionally symmetric-and of the zonally- 
symmetric- initialized tracer with the first principal component. The simula- 
tion was driven with NCEP reanalysis 1 data on the 500 K isentropic level 
with an Eulerian time-step of six (6) hours and a Lagrangian time-step of 
one (1) hour. 



5 



Figure 2: Comparison of a simulated tracer (a.) and the first principal 
component (b.) for the same time integration. 

80 remains approximately constant. It also shows how quickly the other singu- 

81 lar vectors decay, so that the largest will eventually dominate in accordance 

82 with Equation [TUl 

83 5 Principal component proxy 

84 Principal component proxy analysis works by linearly regressing the largest 

85 singular vectors with the measurements. Since the measurements will likely 

86 not occur all at the same time, we must first generate the right singular 

87 vectors for a given tracer mapping, Hj, at lead time j, then generate a 
83 series of left singular vectors for different time steps by applying the tracer 

89 mapping. For the right singular vectors, ARPACK is used to compute the 

90 eigenvectors of Hj ■ Hj. Values interpolated within the right singular vectors 

91 at the measurement locations are fitted through coefficients, {q}, to the 

92 measurements. Thus, we need to fit the following equation: 



m 




(11) 



93 where m is the number of singular vectors used in the analysis, is the 

94 measurement at time step k (we assume that each measurement has a unique 
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Figure 3: Plot of the top five (5) singular-values of a semi-Lagrangian tracer 
simulation over time. Straight-line fits return the Lyapunov-spectrum. The 
simulation was done on the 500 K isentropic level. 
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Figure 4: Comparison of simulated tracer fields with the reconstructed ver- 
sion, a. and b. compare the initial field of the simulated versus the recon- 
structed, respectively, while c. and d. compare the fields at the lead time, 
60 days later. 
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Zonally sym. initialized tracer vs. reconstructed 
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Figure 5: Correlation over time of a zonally-symmetric-initialized, passive 
tracer with one reconstructed using PC proxy using a lead time of 60 days, 
marked by the vertical, dashed line. 
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time stamp), iJ^ is the tracer mapping at time k and Wk is a vector of 
i nterpolation coeffic ients. The fitting is done using a hnear least squares 
f lGalassi et aD . l2007h . 

To perform the analysis, we need to choose a lead time, as well as a 
measurement window. The lead time determines how long the tracer is ad- 
vected before performing the SVD. Measurements are selected within the 
measurement window which is centered at the end of the lead time. 

Figure m and [5] shows the results of a test retrieval for a zonally-symmetric- 
initialized tracer using a lead time of 60 days and five singular vectors. The 
Lyapunov spectrum can help us select the number of singular vectors as it 
shows how many remain significant at a given lead time-see Figure [31 Ten 
sparse measurements were randomly selected in space and time within a 
measurement window of one day-these are plotted in Figure [61 The Pear- 
son correlation for the initial field (Figures [1^. and b.) is 0.875, while the 
correlation at the lead time (Figures [It. and d.) is 0.99 
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110 6 Ozone reconstruction 



111 The purpose here is not to perform a rigorous vahdation, but rather to 

112 demonstrate proof- of- concept. To do this we perform a cross-vahdation exer- 

113 cise on Polar Ozone and Aerosol Measurement (POAM) III satellite retrievals 
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114 ( iLucke et al.l . Il999l ) in which the data is divided into two parts: a training 

115 data set and a test data set. The two are divided approximately evenly: 

116 each measurement is selected at random to go into one or the other set. The 

117 analysis is done once again at the 500 K isentropic level; a lead time of 60 
days and a measurement window of 2 days is used. 



Validation results are shown in Figure [71 Unlike in iRandall et al.l ( 120021 ). 
the reconstruction is done over the entire globe. Since ozone concentrations 
are generally lower in the Southern hemisphere, this will produce an artifi- 

122 cially high skill score, thus we also include comparisons limited to each of 

123 the two hemispheres. Figure [Tli. shows the error statistics, normalized by 

124 the original error from the POAM inversion. Note that the standard devia- 

125 tion for this error is almost exactly one (1), meaning that the error for the 

126 reconstructed ozone is about the same as the original measurement error. 

127 Finally, we compare the ozone fields reconstructed from the POAM data 

128 with ozone sonde m easurements from the World Ozone Data Centre (WODC) 



129 ( iHare et al.l . |2000[ ). Validation results are shown in Figure El Once again, the 

130 comparison is done for both Northern and Southern hemispheres exclusively, 

131 using the globally reconstructed ozone, and error statistics are shown in 

132 the final figure, Eli. Root-mean-square error for the ozone sondes is 0.45 

133 /imol/mol (volume-mixing-ratio (vmr) in parts-per-million (ppm): ppmv) 

134 while for the cross-validation (statistics are graphed with the dotted line) it 

135 is 0.20 /imol/mol. 

136 A sample reconstructed ozone field is shown in Figure |9l Note that for the 

137 sample field, values towards the equator and the lower latitudes are suspect, 

138 discussed in more detail in Section |71 below. The launch stations are plotted 

139 on the field and, like the POAM measurements, are mostly restricted to the 

140 higher and mid-latitudes. 

141 7 Discussion and conclusions 

142 Further work needs to be done to determine the optimal number of PCs to 

143 use in an analysis as well as the optimal lead times. Preliminary work shows 
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Figure 7: a. Scatter plot of PC proxy cross-validation results with POAM III 
ozone data on the 500 K isentrop for November and December 1998, 60 day 

lead time and 2 day measurement window, b. and c. show comparison results 
restricted to Northern and Southern hemisphere respectively, c. Histogram 
of errors normalized by original error estimates from the POAM inversion. 
The sohd vertical lines shows the average, while the dashed lines show the 
standard deviation. 
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Figure 8: Ozone reconstructed from POAM versus ozone sonde data. Figures 
b. and c. restrict the comparison to the Northern and Southern hemispheres 
respectively. Figured, shows error statistics (histogram bars) . Solid vertical 
line is the average, while the dashed lines show the standard deviation. Error 
statistics for the cross-validation are shown for comparison (dotted line). 
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Reconstructed ozcne; 500K, 1998/10/05-06 
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Figure 9: Sample ozone field reconstructed using the PC proxy technique 
from POAM III data at the 500 K isentrop. Triangles show the locations of 
the ozone sonde launch stations used in the validation exercise. 
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144 that beyond a lead time of about 60 days, skill scores differ little. Presum- 

145 ably, reconstruction of shorter-(longer-)lived tracers would work better with 

146 shorter (longer) lead times. Naturally, a longer lead time means longer com- 

147 pute times. The number of PCs required will be related in part to the lead 

148 time, as discussed in Section |5l with shorter lead times requiring more PCs. 

149 Obviously, a larger number of measurements will support the use of more 

150 PCs. 

151 Another problem in some of the reconstructed fields is that the solution 

152 tended to blow up towards the equator, showing ringing, negative concentra- 

153 tions, and other artifacts. Including a constant term in the fitting procedure 

154 did little to alleviate the problem. A better solution would be to use some 

155 form of regular ization to constrain the solution, as in optimal estimation 
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156 f lRodgersl . l2000f ). or better still, include more measurements in the analysis. 

157 Most of the ozone sondes were also launched in the higher latitudes, so this 

158 did not significantly affect the comparison results. 

159 Nonetheless, principal-component (PC) proxy tracer analysis is shown to 
be a powerful, dynamical interpolation method capable of reconstructing a 
passive tracer using as few as ten (10) sparse measurements. Ozone recon- 

162 structed from the Polar Ozone and Aerosol (POAM) III instrument showed 

163 reasonable agreement with ozone sondes, despite using, on average, 52 mea- 

164 surements per field and these limited to a narrow latitude bands in either 

165 hemisphere. 

166 The method has the advantage over more tradition proxy tracer anal- 
ysis in that it provides more degrees of freedom from the higher principal 
components thus can account for more recent changes in the tracer. 

The ozone reconstruction for the cross-validation exercise was so good, in 
fact, that the errors were of the same order as the original POAM retrievals. 



suggesting that these error estimates are too high to begin with. 



172 Software for this project can be found at: ,http : // ctraj . sf . net 
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